gradient scale
Temperature-Free Loss Function for Contrastive Learning
As one of the most promising methods in self-supervised learning, contrastive learning has achieved a series of breakthroughs across numerous fields. A predominant approach to implementing contrastive learning is applying InfoNCE loss: By capturing the similarities between pairs, InfoNCE loss enables learning the representation of data. Albeit its success, adopting InfoNCE loss requires tuning a temperature, which is a core hyperparameter for calibrating similarity scores. Despite its significance and sensitivity to performance being emphasized by several studies, searching for a valid temperature requires extensive trial-and-error-based experiments, which increases the difficulty of adopting InfoNCE loss. To address this difficulty, we propose a novel method to deploy InfoNCE loss without temperature. Specifically, we replace temperature scaling with the inverse hyperbolic tangent function, resulting in a modified InfoNCE loss. In addition to hyperparameter-free deployment, we observed that the proposed method even yielded a performance gain in contrastive learning. Our detailed theoretical analysis discovers that the current practice of temperature scaling in InfoNCE loss causes serious problems in gradient descent, whereas our method provides desirable gradient properties. The proposed method was validated on five benchmarks on contrastive learning, yielding satisfactory results without temperature tuning.
Privacy-Preserving Low-Rank Adaptation for Latent Diffusion Models
Luo, Zihao, Xu, Xilie, Liu, Feng, Koh, Yun Sing, Wang, Di, Zhang, Jingfeng
Low-rank adaptation (LoRA) is an efficient strategy for adapting latent diffusion models (LDMs) on a private dataset to generate specific images by minimizing the adaptation loss. However, the LoRA-adapted LDMs are vulnerable to membership inference (MI) attacks that can judge whether a particular data point belongs to the private dataset, thus leading to the privacy leakage. To defend against MI attacks, we first propose a straightforward solution: Membership-Privacy-preserving LoRA (MP-LoRA). MP-LoRA is formulated as a min-max optimization problem where a proxy attack model is trained by maximizing its MI gain while the LDM is adapted by minimizing the sum of the adaptation loss and the MI gain of the proxy attack model. However, we empirically find that MP-LoRA has the issue of unstable optimization, and theoretically analyze that the potential reason is the unconstrained local smoothness, which impedes the privacy-preserving adaptation. To mitigate this issue, we further propose a Stable Membership-Privacy-preserving LoRA (SMP-LoRA) that adapts the LDM by minimizing the ratio of the adaptation loss to the MI gain. Besides, we theoretically prove that the local smoothness of SMP-LoRA can be constrained by the gradient norm, leading to improved convergence. Our experimental results corroborate that SMP-LoRA can indeed defend against MI attacks and generate high-quality images. Our code is available at https://github.com/WilliamLUO0/StablePrivateLoRA.
- Europe > Austria > Vienna (0.14)
- Oceania > New Zealand > North Island > Auckland Region > Auckland (0.04)
- Asia > Singapore (0.04)
- Information Technology > Data Science > Data Mining > Big Data (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.66)
On the Ideal Number of Groups for Isometric Gradient Propagation
Kim, Bum Jun, Choi, Hyeyeon, Jang, Hyeonah, Kim, Sang Woo
These behave similarly in that they apply mean and standard deviation (std) normalization and an affine transform. The difference lies in the units used for computing Recently, various normalization layers have been the mean and std. For example, for n features, layer proposed to stabilize the training of deep neural normalization computes a single mean and std for normalization, networks. Among them, group normalization is a whereas instance normalization computes n means generalization of layer normalization and instance and stds. Meanwhile, group normalization partitions n features normalization by allowing a degree of freedom in into G groups to compute G means and stds. From this the number of groups it uses. However, to determine perspective, layer normalization is a special case of group the optimal number of groups, trial-and-errorbased normalization for G = 1, and instance normalization is a hyperparameter tuning is required, and such special case of group normalization for G = n. Thus, group experiments are time-consuming. In this study, we normalization is more comprehensive and has a degree of discuss a reasonable method for setting the number freedom from the setting of the number of groups.